High throughput AES architecture

ABSTRACT

An advanced encryption system (AES) architecture includes a maximum parallel encryption module which implements one round of the AES algorithm in one clock cycle, and a maximum parallel key scheduling module which generates sub-keys in one clock cycle in parallel with the encryption module, thereby permitting feedback modes of operation to be used without adversely affecting AES throughput. A controller controls the operation of the encryption and key scheduling modules such that one round is completed per clock cycle. The controller is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs). The architecture also preferably includes asynchronous input and output buffers.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to the field of encryption systems, and particularly to advanced encryption standard (AES) architectures.

[0003] 2. Description of the Related Art

[0004] The advanced encryption standard (AES) is a new encryption standard which implements the Rijndael algorithm. The Rijndael algorithm accepts data blocks and key sizes of 128, 192, or 256 bits; the AES implementation is a symmetric block cipher with 128 bit data blocks and a key size that can be chosen from 128, 192, or 256 bits.

[0005] Several possible implementation modes of the AES standard are shown in FIG. 1. The AES algorithm may be employed as an electronic code book (ECB) which receives plaintext (P) and produces an encrypted output (C). The algorithm may also be employed in one of several feedback modes of operation; such feedback modes include Cipher Block Chaining (CBC), Cipher Feedback (CFB), and Output Feedback (OFB).

[0006] Ideally, an implementation of the AES standard will have a high data rate. Several AES designs have been proposed to achieve a high data rate based on pipelined architectures. These work well when employing the AES algorithm as an ECB, with no feedback. However, the AES standard is most often used in the feedback modes of operation; in these modes, the output of the AES algorithm is fed back to the input. Unfortunately, this arrangement is incompatible with pipeline structures, due to the long latency of each pipeline path.

SUMMARY OF THE INVENTION

[0007] An AES architecture is presented which overcomes the problems noted above. High throughput is achieved, even when the AES algorithm is employed with one of the feedback modes of operation.

[0008] The present invention is a low latency, non-pipelined AES architecture. Hardware is provided for one encryption round, which is re-used as needed to complete the encryption process. This permits feedback modes to be used without adversely affecting AES throughput.

[0009] The present architecture requires a maximum parallel encryption module, which is arranged to implement one round of the AES algorithm in one clock cycle. It also requires a maximum parallel key scheduling module, arranged to generate sub keys in one clock cycle in parallel with the encryption module. The encryption and key scheduling modules are preferably made from combinatorial logic blocks, replicated as necessary to achieve one round per clock cycle.

[0010] A controller controls the operation of the encryption and key scheduling modules such that one round of the AES algorithm is completed per clock cycle. The controller is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).

[0011] Further features and advantages of the invention will be apparent to those skilled in the art from the following detailed description, taken together with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012]FIG. 1 is a diagram showing known implementations of various modes of operation used in combination with the AES algorithm.

[0013]FIG. 2 is a block diagram of an AES architecture in accordance with the present invention.

[0014]FIG. 3 is a block diagram of a maximum parallel encryption data path in accordance with the present invention.

[0015]FIG. 4a is a block diagram of a key scheduling architecture as might be used with an AES architecture in accordance with the present invention, which accommodates data block lengths of 128, 192 or 256 bits.

[0016]FIG. 4b is an alternative embodiment of a key scheduling architecture as might be used with an AES architecture in accordance with the present invention, which accommodates a data block length of 128 bits.

[0017]FIG. 5 is a block diagram of a hierarchical distributed control scheme as might be used with an AES architecture in accordance with the present invention.

[0018]FIGS. 6a-6 c illustrate the operation of the present AES architecture in a three different feedback modes of operation.

DETAILED DESCRIPTION OF THE INVENTION

[0019] An AES architecture in accordance with the present invention is shown in FIG. 2. When properly arranged, the present architecture provides high throughput, even with feedback modes of operation. At the heart of the architecture are an encryption module 10, a key scheduling module 12, and a controller 14. Encryption module 10 is made maximum parallel; i.e., all operations that can occur in parallel, do occur in parallel. This means that every bit of an N-bit data block is processed simultaneously through the encryption module . Thus, if the data block length N is chosen to be 256 bits, the encryption module receives and processes all 256 bits at once. Furthermore, the encryption module implements one round of the AES algorithm in one clock cycle.

[0020] The key scheduling module 12 is also made maximum parallel, such that the sub-keys required by encryption module 10 are generated in one clock cycle, in parallel with the encryption module.

[0021] Encryption module 10 and key scheduling module 12 are controlled via controller 14. The controller is adapted to operate encryption module 10 and key scheduling module 12 to perform one round of the AES algorithm in one clock cycle. Controller 14 is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs), such as an input FSM 15 and an output FSM 16 which control the operation of an input buffer 17 and an output buffer 18, respectively. The controller preferably also communicates with the outside world via input commands and output status bits. The control scheme preferably also includes FSMs 19 and 20, which control the operation of encryption module 10 and key scheduling module 12, respectively, and may be internal or external to their respective modules. Controller 14 preferably also includes an FSM 22; the controller's implementation is discussed in more detail in relation to FIG. 5, below.

[0022] The key is provided to key scheduling module 12 either via the input port and encryption module, or (as shown in FIG. 2) via a separate port; in either case, the key is stored in a key register 26. When the key is provided via a separate port, the system also includes a key entry buffer 27 which is controlled with its own FSM 28.

[0023] When arranged as described above, the present AES architecture provides low latency and high throughput, even when used with feedback modes of operation.

[0024] The architecture also preferably includes asynchronous input and output buffers, which implement a full handshake. Asynchronous input buffer 17 loads X-bit data bytes to be encrypted (P), places them in parallel in an N-bit internal register 24, and presents the N bits to the input of encryption module 10 simultaneously. Similarly, asynchronous output buffer 18 receives the N-bit output from encryption module 10 and outputs encoded X-bit data bytes (C) to an output bus. This arrangement decouples the external I/O operations, i.e., the loading and unloading of data, from the internal operation of the encryption core (modules 10 and 12). This allows the input and output busses to be any width compared to the internal input and output registers. Thus, the encryption core can be used in an environment in which the number of pins is limited (e.g., an 8-bit bus or a serial link), as well as with high speed parallel busses (e.g., 64, 128 or 256 bits). Another benefit afforded by the preferred asynchronous input and output buffers is that they enable a slow input and/or output to still be combined with fast internal operation, with the handshaking stretched over a large number of clock cycles to accommodate the slow interface.

[0025] One possible implementation for encryption module 10 is shown in FIG. 3. The module includes four different sub-modules: a substitution sub-module 30, a shift row sub-module 32, a mix column sub-module 34, and a key addition sub-module 36; the functionality of each sub-module is defined in the AES standard. To achieve high throughput and low latency, the sub-modules are preferably implemented with combinatorial logic and lookup tables, and the data path is made wide enough to accommodate the entire data block length of 128, 192, or 256 bits. The data path may be wider than that shown in FIG. 3; for example, the path would be twice as wide as that shown to accommodate a data block length of 256 bits.

[0026] For substitution sub-module 30, the incoming data bits are preferably divided into 8-bit bytes, each of which is used to address an S-box lookup table. Each S-box contains 256 8-bit entries. To provide maximum parallelism and to finish one round of encryption in one clock cycle, the same S-box is replicated 32 times for an expected data block length of 256 bits. The S-box is replicated 16 or 24 times for expected data block lengths of 128 or 192 bits, respectively.

[0027] For shift row sub-module 32, the 256 bits of incoming data (assuming a maximum expected data block length of 256 bits) are preferably divided into four 64 bit chunks, each of which is called a “row” and contains eight bytes. Byte-wise cyclic shits are performed on each row, with the amount of shift determined by the block length through a lookup table, as defined in the AES standard.

[0028] For mix column sub-module 34, matrix multiplication is performed on the shifted bytes in accordance with the mix column definition specified in the AES standard, using combinatorial logic; four, six, or eight blocks are used for data block lengths of 128, 192, or 256 bits, respectively.

[0029] Finally, key addition sub-module 36 exclusive-OR's the mix column output with the sub-keys received from key scheduling module 12, as prescribed by the AES standard, to generate the encrypted output. Sub-module 36 uses 128, 192 or 256 exclusive-OR gates to produce an output of 128, 192 or 256 bits, respectively.

[0030] Maximum parallel key scheduling module 12 has a data path wide enough to accommodate the maximum expected key length. Sub-keys are generated on the fly, in one clock cycle and in parallel with the encryption module. Key scheduling module 12 is arranged to accommodate the different key and block lengths allowed by the Rijndael algorithm or the AES standard, as necessary. The Rijndael algorithm allows block lengths and key lengths of 128, 192 and 256 bits, while the ABS standard limits the block length to 128 bits. For the former case, the key scheduling module 12 is arranged to accommodate the nine different key length and block length combinations, and operates as defined in the Rijndael algorithm. For the latter case, only three combinations must be accommodated, with operation of the key scheduling module defined in the AES standard.

[0031] The present architecture can support a chosen combination of key-length k and data block length N, which may require differing numbers of key schedule iterations and round transformations. As noted above, one round transformation per clock cycle is required. Consequently, the speed of the key-scheduling process must be adapted as k and N change. Depending on the parameter values, it may be necessary to complete 0, 1 or 2 key scheduling iterations per clock cycle to keep up with 1 round transformation per clock cycle. For example, when 256 bit data blocks and 128 bit sub-keys (N=256, k=128), then 2 key schedule iterations are needed for each data block. Non-integral rates can also occur: for example, if N=128 and k=192, 1.5 key schedule iterations are required per data block.

[0032] One key scheduling architecture capable of accommodating these combinations is shown in FIG. 4a. Key scheduling module 12 has to provide one N-bit roundkey per clock cycle to encryption module 10. The roundkey is constructed out of k-bit sub-keys. When N is larger than k, multiple sub-keys are required within one clock cycle. The use of two key scheduling blocks 40 and 42 allow evaluation of two iterations of the key scheduling. The N-bit roundkey is assembled out of k-bit sub-keys P, C, and N, produced by the previous, current, and next key schedule iterations. Assembly of the roundkey is under the control of a key schedule controller 44. Controller 44 also steers the pace of the key schedule iterations by selecting which sub-key is used as iterated key: when the P key is selected, the key schedule does not advance. When the C key is selected, one iteration per clock cycle is taken, and when the N sub-key is selected, two iterations are taken per clock cycle.

[0033] A simplified key scheduling architecture may be used when only three key and block length combinations must be accommodated; such an architecture is shown in FIG. 4b. Here, only one key scheduling block 52 is required to produce the 128 bit roundkey required by the encryption module.

[0034] As noted above, controller 14 is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs); this avoids having the controller logic in the critical path, which might slow down the system. Such a control scheme is shown in FIG. 5: main FSM 14 receives instructions from and provides status to the outside world, and decomposes the instructions into detailed micro instructions (M.I.) for the different local FSMs, such as input FSM 15, encryption FSM 19, key scheduling FSM 20, and output FSM 18. The local FSMs provide control signals (C.S.) to their respective modules, and provide status back to the main FSM. Each FSM preferably operates off of a single clock (CLK). This approach has the advantage that each of the FSMs can be kept small, and thus high speed.

[0035] Note that the implementations of the control scheme, key scheduling module, and encryption module shown above are merely exemplary. Other designs could be used to implement these functions in accordance with the definitions given in the AES standard, as long as the encryption and key scheduling modules are made maximum parallel, and the architecture can implement one round of the AES algorithm in one clock cycle.

[0036] As noted above, the present AES architecture can be used with one of the feedback modes of operation. This is illustrated in FIGS. 6a-6 c, which shows only circuitry immediately around encryption module 10. In FIG. 2, the architecture is arranged to implement the electronic code book (ECB) mode of operation. In FIG. 6a, the Cipher Block Chaining (CBC) feedback mode of operation is implemented between register 24 and encryption module 10. In FIG. 6b, the Output Feedback (OFB) mode of operation is illustrated, while in FIG. 6c, the Cipher Feedback (CFB) feedback mode is shown. Other feedback modes may be accommodated in a similar fashion.

[0037] While particular embodiments of the invention have been shown and described, numerous variations and alternate embodiments will occur to those skilled in the art. Accordingly, it is intended that the invention be limited only in terms of the appended claims. 

I claim:
 1. An advanced encryption standard (AES) architecture which provides high throughput and low latency, comprising: a maximum parallel encryption module arranged to receive a plurality of data bytes to be encrypted and to implement one round of the AES algorithm in one clock cycle, a maximum parallel key scheduling module arranged to generate sub-keys in one clock cycle in parallel with said encryption module, said sub-keys provided to said encryption module, and a controller arranged to control the operation of said maximum parallel encryption and key scheduling modules such that said architecture performs one round of the AES algorithm in one clock cycle.
 2. The AES architecture of claim 1, further comprising: an asynchronous input buffer arranged to receive data bytes to be encrypted, to buffer multiple ones of said data bytes in parallel, and to provide said parallel data bytes to said maximum parallel encryption module, and an asynchronous output buffer arranged to receive the output of said maximum parallel encryption module and to output said encrypted data bytes to an output bus.
 3. The AES architecture of claim 2, wherein said maximum parallel encryption module comprises: a substitution sub-module comprising substitution blocks which are replicated as needed to receive all of said parallel data bytes from said input buffer simultaneously, a shift row sub-module which receives the outputs of said substitution sub-module, a mix column sub-module which receives the outputs of said shift row sub-module, and a key addition sub-module arranged to receive and combine the outputs of said mix column sub-module and said sub-keys from said key scheduling module, and to provide the results at an output, said output being the output of said maximum parallel encryption module.
 4. The AES architecture of claim 3, wherein said maximum parallel encryption and key scheduling modules are implemented exclusively with combinatorial logic.
 5. The AES architecture of claim 2, wherein said controller is implemented with a hierarchical distributed control scheme comprising communicating finite state machines (FSMs), comprising: a main FSM, and local FSMs which are controlled by said main FSM, said local FSMs comprising: an maximum parallel encryption module FSM which controls said maximum parallel encryption module, a key scheduling module FSM which controls said key scheduling module, an input buffer FSM which controls said input buffer, and an output buffer FSM which controls said output buffer.
 6. The AES architecture of claim 1, wherein said controller is implemented with a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).
 7. The AES architecture of claim 1, wherein said architecture employs a chosen key-length k and a data-block length m, said key scheduling module comprising: a first key scheduling sub-module arranged to receive the previously-generated sub-key and to generate the current sub-key, a second key scheduling sub-module arranged to receive the current sub-key and to generate the next sub-key, and a roundkey assembly sub-module which receives said previously-generated sub-key, said current sub-key, said next sub-key and is arranged to assemble a m-bit roundkey from said k-bit sub-keys and provide said roundkey to said encryption module.
 8. The AES architecture of claim 1, wherein said architecture implements the AES standard with 128 bit data-blocks and a chosen key-length k, said key scheduling module comprising: a key scheduling sub-module arranged to receive the previously-generated sub-key and to generate the current sub-key, and a roundkey assembly sub-module which receives said previously-generated sub-key and said current sub-key and is arranged to assemble a 128-bit roundkey from said sub-keys, said roundkey provided to said encryption module.
 9. The AES architecture of claim 1, wherein said architecture implements the Rijndael algorithm with a data-blocks length of 128, 192 or 256 bits and a key-length of 128, 192 or 256 bits.
 10. The AES architecture of claim 1, wherein said architecture implements the AES standard with a data-block length of 128 bits and a key-length of 128, 192 or 256 bits.
 11. The AES architecture of claim 1, wherein said architecture is arranged to implement the electronic code book (ECB) mode of operation.
 12. The AES architecture of claim 1, wherein said architecture is arranged to implement a feedback mode of operation.
 13. An advanced encryption system (AES) architecture which provides high throughput and low latency, comprising: an asynchronous input buffer arranged to receive data bytes to be encrypted, to buffer multiple ones of said data bytes in parallel, and to provide said parallel data bytes at an output, a maximum parallel encryption module arranged to receive the output of said input buffer and to implement one round of the AES algorithm in one clock cycle, a maximum parallel key scheduling module arranged to generate sub-keys in one clock cycle in parallel with said encryption module, an asynchronous output buffer arranged to receive the output of said maximum parallel encryption module and to output said encrypted data bytes to an output bus, and a controller arranged to control the operation of said maximum parallel encryption and key scheduling modules such that said architecture performs one round of the AES algorithm in one clock cycle, wherein said controller is a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).
 14. The AES architecture of claim 13, wherein said maximum parallel encryption module comprises: a substitution sub-module comprising substitution blocks which are replicated as needed to receive all of said parallel data bytes from said input buffer simultaneously, a shift row sub-module which receives the outputs of said substitution sub-module, a mix column sub-module which receives the outputs of said shift row sub-module, and a key addition sub-module arranged to receive and combine the outputs of said mix column sub-module and said sub-keys from said key scheduling module, and to provide the results at an output, said output being the output of said maximum parallel encryption module, each of said maximum parallel encryption module sub-modules implemented exclusively with combinatorial logic.
 15. The AES architecture of claim 13, wherein said communicating FSMs comprise: a main FSM, and local FSMs which are controlled by said main FSM, said local FSMs comprising: an maximum parallel encryption module FSM which controls said maximum parallel encryption module, a key scheduling module FSM which controls said key scheduling module, an input buffer FSM which controls said input buffer, and an output buffer FSM which controls said output buffer.
 16. The AES architecture of claim 13, wherein said architecture employs a chosen key-length k and a data-block length m, said key scheduling module comprising: a first key scheduling sub-module arranged to receive the previously-generated sub-key and to generate the current sub-key, a second key scheduling sub-module arranged to receive the current sub-key and to generate the next sub-key, and a roundkey assembly sub-module which receives said previously-generated sub-key, said current sub-key, said next sub-key and is arranged to assemble a m-bit roundkey from said k-bit sub-keys and provide said roundkey to said encryption module.
 17. The AES architecture of claim 13, wherein said architecture implements the AES standard with 128 bit data-blocks and a chosen key-length k, said key scheduling module comprising: a key scheduling sub-module arranged to receive the previously-generated sub-key and to generate the current sub-key, and a roundkey assembly sub-module which receives said previously-generated sub-key and said current sub-key and is arranged to assemble a 128-bit roundkey from said sub-keys, said roundkey provided to said encryption module.
 18. The AES architecture of claim 13, wherein said architecture implements the Rijndael algorithm with a data-blocks length of 128, 192 or 256 bits and a key-length of 128, 192 or 256 bits.
 19. The AES architecture of claim 13, wherein said architecture implements the AES standard with a data-block length of 128 bits and a key-length of 128, 192 or 256 bits.
 20. The AES architecture of claim 13, wherein said architecture is arranged to implement the electronic code book (ECB) mode of operation.
 21. The AES architecture of claim 13, wherein said architecture is arranged to implement a feedback mode of operation.
 22. The AES architecture of claim 21, wherein said architecture is arranged to implement the Cipher Block Chaining (CBC) feedback mode of operation.
 23. The AES architecture of claim 21, wherein said architecture is arranged to implement the Cipher Feedback (CFB) feedback mode of operation.
 24. The AES architecture of claim 21, wherein said architecture is arranged to implement the Output Feedback (OFB) feedback mode of operation. 